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Abstract 

This paper considers the stability of online learning algorithms and its implica- 
tions for learnability (bounded regret). We introduce a novel quantity called for- 
ward regret that intuitively measures how good an online learning algorithm is if 
it is allowed a one-step look-ahead into the future. We show that given stability, 
bounded forward regret is equivalent to bounded regret. We also show that the 
existence of an algorithm with bounded regret implies the existence of a stable al- 
gorithm with bounded regret and bounded forward regret. The equivalence results 
apply to general, possibly non-convex problems. To the best of our knowledge, 
our analysis provides the first general connection between stability and regret in 
the online setting that is not restricted to a particular class of algorithms. Our 
stability-regret connection provides a simple recipe for analyzing regret incurred 
by any online learning algorithm. Using our framework, we analyze several exist- 
ing online learning algorithms as well as the "approximate" versions of algorithms 
like RDA that solve an optimization problem at each iteration. Our proofs are 
simpler than existing analysis for the respective algorithms, show a clear trade-off 
between stability and forward regret, and provide tighter regret bounds in some 
cases. Furthermore, using our recipe, we analyze "approximate" versions of sev- 
eral algorithms such as follow-the-regularized-leader(FTRL) that requires solving 
an optimization problem at each step. 

1 Introduction 

The fundamental role of stability in determining the generalization ability of learning algorithms in 
the setting of iid data is now well recognized. Moreover, our knowledge of the connection between 
stability and generalization is beginning to achieve a fair degree of maturity (see, for instance, [4, 
13, 18, 22]). However, the same cannot be said regarding our understanding of the role of stability 
in online adversarial learning. 

Recently, several results have shown connections between learnability of a concept class and stability 
of its empirical risk minimizer (ERM). Apart from theoretical interest, such insights into stability 
and learnability, can potentially help in designing more practical algorithms. For example, [13] 
show that under certain settings, stability is a more general characterization than VC-dimension; 
good generalization performance can be guaranteed for concept classes with stable ERM, even if its 
VC-dimension is infinite. 

However, most of the existing implications of stability are in the batch or i.i.d. learning setting, with 
only a few results in the online adversarial setting. Online learning can be modeled as a sequential 
two-player game between a player (learner) and an adversary where, at each step, the player takes an 
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action from a set and the adversary plays a loss function. The player's loss is evaluated by applying 
the adversary's move to the player's action and key quantity to control is the regret of the player in 
hindsight. Understanding stability in the online learning setting is not only a challenging theoretical 
problem but is also important from the point of view of applications. For instance, stability allows us 
to derive guarantees that apply to dependent (non-iid) data [2] and is critical in areas such as privacy 
[11]. 

There is a fundamental challenge in extending the connection between stability and learnability 
from the iid to the online case. In the iid setting, empirical risk minimization (ERM) serves as a 
canonical learning algorithm [23]. Thus, given any hypothesis class, it is sufficient to just analyze the 
stability of ERM over the class to characterize its learnability in the batch setting. Unfortunately, no 
such canonical scheme is known for online learning, making it significantly more involved to forge 
connections between online learnability and stability. We circumvent this difficulty by studying 
connections between stability and regret of arbitrary online learning algorithms. 

In this paper, we circumvent the above mentioned issue by studying connections between stability 
and regret of learning algorithms, rather than online learnability of individual concept classes, in a 
generic sense. To this end, we first define stability for online learning algorithms. Our definition is 
essentially "leave last one out" stability, also considered by [20]. We also define a uniform version 
of this stability measure. However, stability alone cannot guarantee bounded regret. For example, 
an algorithm that always plays one fixed move is clearly the most stable any algorithm can be. But 
its regret can hardly be bounded. Hence, an additional condition is required that forces the algorithm 
to make progress. To this end, we introduce a novel measure called forward regret: the excess loss 
incurred with a look-ahead of one time step (i.e., when player makes its t th move after seeing the 
adversary's t th move). We show fundamental results relating the three conditions, namely online 
stability, bounded forward regret and bounded regret. First, assuming stability, bounded regret 
and bounded forward regret are equivalent. Second, given an algorithm with bounded regret, we 
can always obtain a stable algorithm with bounded regret and bounded forward regret. We would 
like to stress that these general results do not rely on convexity assumptions and are not restricted 
to a particular family of learning algorithms. In contrast, [20] provides equivalence of stability and 
regret for only certain families of algorithms and concept classes. 

We illustrate the usefulness of our general framework by considering several popular online learning 
algorithms like Follow-The-Leader (FTL) [10, 7], Follow-The-Regularized-Leader (FTRL) [17, 1], 
Implicit Online Learning (IOL) [12], Regularized Dual Averaging (RDA) [24] and Composite Ob- 
jective Mirror Descent (COMiD) [8]. We obtain regret bounds for all of them using the fundamental 
connections between forward regret and stability thereby demonstrating that our framework is not 
restricted to a particular class of algorithms. Our regret analysis is arguably simpler than existing 
ones and, in some cases such as IOL, provides tighter guarantees as well. 

Finally, we consider "approximate" versions of RDA, IOL, and FTRL algorithms where the opti- 
mization problem at each step is solved only up to a small but non-zero additive error. It is important 
to consider such an analysis because, in practice, the optimization problems arises at each step will 
not be solved to infinite precision. For each of these three algorithms, we use our general stability 
based recipe to provide regret bounds for their approximate versions. 

We introduce our setup in Section 2. We introduce the online learning framework in Section 3 
and review existing work and contrast it to our work in section 4. We introduce our three online 
learning conditions and show their connections in Section 5. We provide several illustrations of 
the usefulness of our conditions in analyzing existing online algorithms in Section 6 and finally 
conclude with Section 9. 

2 Bregman Divergences and Strong Convexity 

Here we recall the definition of a Bregman divergence [5, 6] which finds use in online learning 
algorithms. We also relate it to the notion of strong convexity, a key property behind many regret 
bounds for online learning. 

Definition 1. Let R : C — > K be a strictly convex function on a convex set C C K d . Also, let R be 
differentiate on the relative interior ofC, ri(C), assumed to be nonempty. The Bregman divergence 
T>r : C x ri(C) — > K + generated by the function R is given by 

V R ( X , y) = i?(x) - R(y) - Vi?(y) T (x - y) 
where Vi?(y) is the gradient of the function R at y. 
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Definition 2. A convex function f : M. d — >• K is strongly convex with respect to a norm \\-\\ if there 
exists a constant a > such that 

D f (u,v) > -||u- v|| 2 Vu,veK d . 

a is called the modulus of strong convexity and / is also referred to as a-strongly convex. 

Now, we present a useful lemma characterizing optima of a strongly convex function. 

Lemma 3. Let f : M. d — > R be an a-strongly convex function and let C C M. d be a convex set. Let 
w* £ C be a minimi zer of f overC, i.e., w* = argmin^g^. /(w). Then, for any u G C, 

/(u)>/(w*) + |||u-w*|| 2 . 

In particular, the minimizer is unique. 

Lower bold case letters {e.g., w, fi) denote vectors, Wi denotes the z-th component of w. The 
Euclidean dot product between a and b is denoted by a T b or (a, b) . A general norm is denoted by 
|| • || and || • || * refers to its dual norm. For most of this paper, we work with arbitrary norms and we 
use || • ||p to refer to a specific £ p norm. Unless specified otherwise, w g R d , C C M d is a compact 
convex set, and l t : R d —> K is any loss function. A function / : C — > R is i-Lipschitz continuous 
w.r.t. a norm ||-|| if |/(x) - f(y)\ < L\\x- y||,Vx,y £ C. 

3 Setup 

We now describe the online learning setup that we use in this paper. Let C C M. d be a fixed set and 
£ be a class of real-valued functions over C. Now, consider a repeated game of T rounds played 
between a player/learner and an adversary. At every step t, 

• The player plays a point w t from a set C. 

• The adversary responds with a function l t € C. 

• The player suffers loss £ t (w t ). 

The quantity of interest in online learning is the regret which measures how good the player per- 
forms compared to the best fixed move in hindsight (i.e. knowing all the moves of the adversary in 
advance). Regret is defined below in (6). The goal in online learning is to minimize the regret re- 
gardless of the function sequence played by the adversary. Online Convex Programming (OCP) [25] 
(respectively Online Linear Programming (OLP)) is a special case of the online learning game above 
where the set C is a compact convex set and C is a class of convex (respectively linear) functions 
defined on C. 

4 Related Work 

For a general introduction to online learning and descriptions of standard algorithms, see [7]. In 
the iid setting, stability is investigated from various points of view in [4, 13, 18, 22]. There are 
only a few papers dealing with stability in the online setting. Recently, [20] defined what we call 
Last Leave-One-Out (LLOO) stability and showed that for FTRL or MD type methods, stable on- 
line learning algorithms have bounded regret. In contrast, we distill out the "progress" in terms of 
forward regret condition and show a much more general connection between stability, regret and 
forward regret. Unlike [20], our method is extremely generic and does not need to assume any spe- 
cific algorithmic form or even any specific function class (like convex functions). We also prove 
that most existing families of online learning algorithms are in fact stable in our sense and using our 
connections provide simple regret bound analysis for them. Another related work [16] considers an 
online algorithm, namely stochastic gradient descent (SGD) algorithm, in the iid setting where each 
function it is samples points in an iid fashion from some distribution. In this setting, [16] defines 
a new notion of online stability which is motivated by uniform stability [4]. The paper shows that 
SGD satisfies the new notion of stability and provides consistency guarantees as well. In contrast, 
our fundamental results connecting stability with regret hold for any algorithm and for any set of 
adversary moves {i-t}, not just those sampled iid from a distribution. 

A general class of online learning algorithms are referred to as Follow-The-Leader (FTL) [7] al- 
gorithms. At step t + 1, this algorithm chooses the element of C which minimizes the sum of the 
functions played by the adversary up to that point: 

/ 

w t+ i = argmin V" /i(w) . (1) 
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It can be shown that surprisingly simple algorithm achieves (3(log T) regret when the adversary is 
restricted to playing strongly convex functions [10]. 

A generalization of FTL is by adding a regularizer which results in the Follow-The-Regularized- 
Leader (FTRL) algorithm [17, 1]. In this case the update is given by 

t 

w t+ i = argmin V" rjfi (w) + R(w) (2) 
wee ^ 

Typically, R is a strongly convex regularizer with respect to the appropriate norm and 77 is a tradeoff 
parameter. Another way of describing FTRL algorithms is using Bregman divergences [17]. In 
particular, by defining 4>o(w) = R(w) and 0t(w) = </> t _i(w) + rjf t (w), we can write FTRL 
update in an equivalent form: 

w t+ i = argmin 77/* (w) +D^ t _ 1 (w) 
wee 

where w is the corresponding unconstrained minimizer. 

Another class of algorithms is the proximal type algorithms also called Mirror Descent(MD) meth- 
ods [15], that typically tries to find an iterate close to the previous iterate but also minimizes the 
current loss function and obtains same rates of regret as FTRL. Similar to FTRL, such algorithms 
also achieves 0(\/T) regret for general convex functions and O(lnT) regret for strongly convex 
functions. It is interesting to note that Zinkevich's algorithm [25] is just a special case of mirror de- 
scent with the Euclidean norm and i?(w) = 5 ||w|| 2 and is similar to a stochastic gradient descent 
update [3]. 

While mirror descent and FTRL look fundamentally different algorithms and were considered to be 
two different ends of the spectrum for online learning algorithms [21], a recent paper [14] shows 
equivalence between different mirror descent algorithms and corresponding FTRL counterparts. In 
particular they show that the FOBOS mirror descent algorithm [9] is conceptually similar to Reg- 
ularized Dual Averaging (RDA) [24] with minor differences emanating out of usage of proximal 
strongly convex regularizer and handling of arbitrary nonsmooth regularization like the i\ norm. 
These difference result in different sparsity properties of the two algorithms. 

5 Three conditions for online learning 

In this section, we formally define our stability notion as well as introduce our bounded forward 
regret condition. We show that given stability, bounded regret and bounded forward regret are 
equivalent. Moreover, any algorithm with bounded regret can be converted into a stable algorithm 
with bounded regret and forward regret. Finally, we consider several existing OCP algorithms and 
illustrate that our forward regret and stability conditions can be used to provide a simple recipe for 
proving regret. For each of the algorithms, our novel analysis simplifies existing analysis signifi- 
cantly and in some cases also tightens the analysis. 

We first define the following three quantities for any online learning algorithm: 

• Online Stability: Intuitively, an online algorithm A is defined to be stable if the consec- 
utive iterates generated by A are not too far away from each other. Formally, if w t is the 
point selected by A at the <-th step, then the (cumulative) online stability of A is given by 

T 

S -a(T) = ^2 II w t - wt+i II . (3) 
t=i 

Now, if S_a(T) = o(T), then we say that A is online stable, of stability is closely related 
to [20] (See Definition 17). Next, we define a stronger definition of stability, which we call 
Uniform Stability: 

US A (t) = \\w t - w t+1 \\ . (4) 

If USa{T) — o(l), then A is defined to be uniformly stable. Clearly, if A is uniformly 
stable then it is (cumulatively) stable as well. In section 6, we show that most of the 
existing online learning methods are actually uniformly stable. Interestingly, for COMiD 
(see section 7.4), while proving cumulative stability is relatively straightforward, one can 
show that uniform stability need not hold in general. 
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• Forward Regret: Forward regret is the hypothetical regret incurred by A if it had access 
to the next move that the adversary was going to make. Note that forward regret cannot 
actually be attained by an algorithm since it depends on seeing one step into the future. 
Formally, 

T 

FR A (T) = ]T &(w t+1 ) - l t (w*)] , (5) 
t=l 

where w* = argmin weC Y^t=i ^*( w )- We define A to have bounded (or vanishing) for- 
ward regret if FTZ A (T) = o(T). Note that if the online algorithms are randomized, we can 
replace the three quantities with their expected counterparts and all the bounds in the paper 
still hold. 

• Regret: Regret is a standard notion in online learning that measures how good the steps of 
the algorithm A are compared to the best fixed point in hindsight: 

T 

KA(T) = J2l i t(™ t )-lt(™*)}. (6) 
t—i 

Here again, if 1Z A (T) = o(T), then A is said to have bounded (or vanishing) regret. 

These three concepts, besides being important in their own right, are also intimately related. In 
particular, in the next section we show that given any two of these conditions, the third condition 
holds. 

5.1 Connections between the three conditions 

In this section, we show that the three conditions (i.e., bounded stability, bounded forward regret and 
bounded regret) defined in the previous section are closely related in the sense that given any two 
of the conditions, the third condition follows directly. For our claim, we first show that assuming 
stability, 

bounded forward regret bounded regret . 

We then prove that bounded regret can be shown to exhibit stability, albeit with worse rates of regret. 
Our claims are formalized in the following theorems. 

Theorem 4. Assume an online algorithm A satisfies the condition of online stability (3) where the 
function played by the adversary at each step is L-Lipschitz. Then, we have, 

n A (T) < L ■ S A (T) + TK A (T), (7) 
FR. A (T) <L-S A (T)+K A (T). 

Therefore, assuming online stability of A bounded forward regret and bounded regret are equivalent 
conditions. 

Proof. We first assume that A has online stability and bounded forward regret. We have 

T T T 

E ^ w *) - ^( w *)i = E ^ w *) - Mw*+i)] + E [^( w *+i) - ^ w *)] 
(=i t=i t=i 

T 

< E L |K - wt + i|| + m(T) < L ■ S(T) + FTZ{T) - o(T), 
t=i 

where the second last inequality follows by Lipschitz continuity of l t and the last equality holds 
as both S(T), J-1Z(T) = o(T). Hence, A has bounded regret. The proof in the reverse direction 
follows identically. □ 

To complete the picture regarding the connections between the three conditions, we now prove the 
following theorem. 

Theorem 5. Let C be a fixed set of bounded diameter D from which a learner A selects a point at 
each step of online learning. Let T be the class of L-Lipschitz functions from which the adversary 
plays a function at each step. Also, let A have bounded regret. Then, there exists a stable algorithm 
with bounded regret and forward regret. 
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Proof. Intuitively, our proof proceeds by constructing an alternative stable algorithm that averages 
a batch of loss functions and feeds it into the "unstable" but bounded regret algorithm A. We 
then show bounded regret and forward regret of this new algorithm. Note that our proof strategy 
is inspired by the proof of Lemma 20 in [22] that shows stability to be a necessary condition for 
learnability in batch setting. 

Formally, given the algorithm A, we construct a new algorithm A 1 in the following way. We divide 
the set of points into batches of B and A' repeats the same point in an entire batch. At the end of 
the batch, it feeds the average of the functions in the batch to A to get its next move. It now sticks 
to this new point for the next B time steps before repeating the process all over. In picture, 

h,^2, ■ ■ ■ ,Ib , 1_b+i, ■ ■ ■ 

Bg x Bg 2 

Note that the function g t , being an average of Lipschitz functions, is itself Lipschitz. Denote the 
elements generated by A' as w' l5 . . . , and those by A as wi, . . . , w^t/sj ■ Note that there are 
only \ T / B\ distinct elements wi, . . . , ~w\t/b\ m this sequence: viz. the elements generated by A 
in response to g±, ... , g\r/B\ ■ The stability analysis of A' now proceeds as follows 

T \T/B\ \T/B\ 

Y W w 't - w 'f+i|| = Y || w (t-i)B+i - w 'tB+i = Y \\™t-wt+i\\<gD = o{T), 
t=l t=l t=l 

for the choice B = O(VT) in particular. This proves that A' is stable. 

In order to show that A' has bounded regret, we consider 

T B\T/B\ [T/B\ iB 

£(£ t K)-Mw*))< Y VtW)-tt(v*)) + L.D.B= Y Y (MO-^*)) + LDB 

t=l t = l 8=1 t=(i-l)B+l 

LT/Bj 

= B Y (#( w ~ S*( w *)) + L ■ D ■ B < B ■ K A {\T/B\) + L ■ D ■ B, 

i=l 

where TZ_a (T) = o(T) as A has bounded regret. The last term in the first inequality is an upper 
bound on the regret due to the last batch of functions (maximally B in number). Selecting B = VT, 
we get TZ^([T/ B\ ) = o(y/T) and hence the above bound is o(T), i.e, A 1 has bounded regret. □ 



Thus we show that given any algorithm with bounded regret, we can convert it into another online 
stable algorithm with bounded regret which also implies bounded forward regret using Theorem 4. 

6 Unified analysis of online algorithms 

In this section we present examples where existing online learning algorithms can be analyzed 
through our stability and forward regret conditions and hence lead to regret bounds directly (see 
Theorem 4). These examples illustrate that the stability and forward regret conditions are critical 
to regret analysis and in fact provide a fairly straightforward recipe for regret analysis of online 
learning algorithms. Note that, unlike the general results of section 5, here we will make convexity 
assumptions on C and £ t . One of the major contributions of this paper is that our analysis signifi- 
cantly simplifies as well as tightens up analysis for existing methods like IOL [12]. 

Before delving into the technical detials, we provide a brief generic sketch of the regret analysis of 
all the algorithms. 

For each of the regret analyses, initially we bound the stability ^ t ||w t — w t+ i|| in terms of the 
learning rate r\ and the Lipschitz coefficient of £ t , L. The bounds on stability are generally obtained 
by exploiting the optimality of w t+ i at iteration (t + 1), the lipschitz continuity of l t and the strong 
convexity of the regularizer R (for the algorithms involving regularization). For the case of IOL, 
||wt — w t+ i| < 2Lrjt, which makes the stability bounded by 2L J^t Vt- 

For FTL, forward regret is non positive by definition of the FTL updates. For all the other algorithms, 
the bounds on the forward regret follow by again using the optimality of w t+ i at (t + l)" 1 iteration 
and comparing the corresponding objective at the final minimizer w*. This generally results in a 
telescoping sum, upper bounding the forward regret in terms of the regularizer R (or the bregman 
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divergence Vji) evaluated at the extreme iterates and wi with all the other terms canceling out 
by appropriately choosing ?y t . In particular, for the case of IOL, the forward regret is bounded by 



Finally bounds on the regret are obtained by using equation (7) while the optimum dependence on 
T are obtained by trading off the step size rj t in the corresponding inequality. Summation over 
appropriate rjt gives us O(logT) rates of regret for strongly convex l t and 0{-JT) rates of regret 
for general convex lipschitz £ t as is common in the literature. 

7 Examples 

7.1 Follow The Leader (FTL) 

Follow the leader(FTL) is a popular method for OCP when the provided functions are strongly 
convex. At the <-th step FTL chooses Wt+i £ C to be the element that minimizes the total loss up to 
that step, i.e., 



The FTL method was analyzed in [7] and [21] for the case when each loss function £ is at least a- 
strongly convex. Here, using our forward regret and stability conditions, we provide a significantly 
simpler analysis with similar regret bounds. It should be noted that our analysis is a generalization 
of the analysis in [7, Section 3.2] from strongly convex functions w.r.t. L2 norm to strongly convex 
functions w.r.t. arbitrary norm. 

Theorem 6. Let each loss function £ t be a-strongly convex and L-Lipschitz continuous. Then, the 
regret incurred by FTL algorithm ( see (8)) is bounded by: 



Proof. Our proof follows the simple recipe of computing stability as well as forward regret bound. 
Stability: Using strong convexity, Lemma 3 and the fact w i+ i is the optimum of (8), 




FTL : 




(8) 



TZ FTL (T) < (1 + lnT). 




(9) 



r=l r=l 

Similarly, using optimality of w t for the t — 1-th step: 

t-i t-i 




(10) 



t=1 r=l 

Adding (9) and (10), and by using Lipschitz continuity of it we get: 

*t(w t ) - *i(w t+ i) > (t - l/2)a||w t - w t+1 | 




(11) 



Using (1 1), we get: 




(12) 




(13) 



(14) 



Next using ( 10) for t = T and (14), 




(15) 
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Similarly using (10) with (15) for t = T - 2, . . . , 1, 

T T 

$>(w*)>^£ t (w i+1 ). (16) 
t=i t=i 

Hence, 

^ftl(T) = 0. (17) 

Hence, using Theorem 4, (13), and (17), 

2L 2 

^ftl(T)< (1 + lnT). (18) 

a 

7.2 Follow The Regularized Leader (FTRL) U 

While FTL is an intuitive algorithm, unfortunately, for non-strongly convex functions it need not 
have bounded regret. However, several recent results show that by adding strongly convex regular- 
ization, FTL can be used to obtain bounded regret. Specifically, 

* 1 

FTRL : w t +i = argmin V^ r (w) + -R(w). (19) 
wee f~[ V 

where R is (generally) a strongly convex function with respect to an appropriate norm. Note that 
the intuition behind adding a regularization is making the algorithm stable. Our analysis of FTRL 
explicitly captures this intuition by showing the existence of stability condition, while forward regret 
follows easily from the forward regret analysis of FTL given above. 

Theorem 7. Let each loss function £ t be L-Lipschitz continuous, diameter (as measured in || • \\) of 
set C be D, and let R be a 1-strongly convex regularization function. Then, the regret incurred by 
Follow The Regularized Leader (FTRL) algorithm (see (19)) is bounded by: 

Tl FTRL {T) < 2 Ly/\\VR\\,Dy/T , 

where \\VR\\* = sup weC ||V-R(w)||*. 

Proof. As for FTL, we again prove regret by first proving stability and forward regret. 

Stability: Similar to (9) and (10), using strong convexity and optimality conditions for t-th and 

t — 1-th step, we get the following relations: 

* 1 

YV T (w t ) + -E(w t ) 

* l l 

> V^ T (w f+1 ) + -R(w t+1 ) + — ||w t - w f+1 || 2 . (20) 

t-i 



yX(w t+ i) + -i?(w t+1 ) 



r=l 
t-1 



>^^r(w() + -R(w t ) + ^-\\w t - w t+1 || 2 . (21) 

Combining (20) and (21) and by Lipschitz continuity of t t : 

Lrj > || w t - w t+ i||. (22) 

Hence, 

T 

Sftkl(T) = || w t - W t+ i|| < Ll]T. (23) 
t=l 

Choosing r\ = satisfies the online stability condition of FTRL. 

Forward Regret: Assuming ^o( w ) = -R(w) and wi = argmin weC i?(w), FTRL is same as FTL 
with an additional 0-th step loss function ^o(') = R{')- Hence using (16), we obtain: 

T T 

X>(w*) + -(R(w*) - i2(wi)) > ^£ t (w t+1 ). (24) 



t=i 1 t=i 



Hence, 



= W) - «(w,» < VR(W ' )T(W '- Wl) < (25, 
rj V V 
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where the first inequality follows using the convexity of R and the last one follows using Cauchy 
Schwartz inequality. Again r\ — ^= provides vanishing forward regret for FTRL. Hence, using 
Theorem 4, 

^ftrl(T) < ^l3lR + tfrjT < 2 L^\\VR\\*dVt. (26) 
V 

by appropriately choosing rj to be -^=. □ 
7.3 Regularized Dual Averaging (RDA) 

Regularized Dual Averaging [24] is a popular online learning method to handle OCP scenarios where 
each loss function is regularized by the same regularization function, i.e., functions at each step are 
of the form t' t (w) = £ t (w) + r (w), where r is a regularization function. RDA computes the iterates 
using following rule: 

t 

RDA: Wf+i = argminV^ sJ w + t ' r(w) + (3 t h(w), (27) 

WGC ^ 

where g t = V^t(w t ), h(w) is a strongly convex regularizer that is separately added and f3 t is the 
trade-off parameter. [24] shows that the above update obtains 0(yT) regret for general Lipschitz 
continuous functions and O(lnT) regret when the regularizer r is strongly convex. 

Note that RDA is same as FTRL except for linearization of the first part of loss function £ t . Hence, 
same regret analysis as FTRL should hold. However, analysis by [24] shows that by using special 
structure of t' t , regret can be bounded even without assuming Lipschitz continuity of the regulariza- 
tion function r. Below, we show that using the same recipe of bounding stability and forward regret 
leads to significantly simpler analysis of RDA as well. Unlike the previous cases, this analysis is 
slightly more tricky as we cannot assume Lipschitz continuity of r to prove stability. 

Theorem 8. Let each loss function £ t be L-Lipschitz continuous, r be a a-strongly convex function 
and wlog min wS e r(w) = 0. Now, using f3 t — at each step, regret of RDA (see (27)) is bounded 
by^(l + \nT). 

Proof. Stability: By strong convexity of r and optimality of w t+ i and w f for the t-th and t — 1-th 

step respectively, 

1 (y, 

■J ^gJ ( w t - ™t+i) + r(w t ) - r(w t+1 ) > -||w t - w t+ i|| 2 , 

T=l 

1 a 
— Y^gJ(w t+1 - w t ) +r(w t+ i) -r(w f ) > -||w t - w t+1 || 2 . 

T = l 

Adding the above two equations, 

fl 1 t_1 \ 2L 

a\\w t - w t+ i|| 2 < -gi - — — rrV'gr (w t - w t+ i) < — ||w t - w t+ i||, (28) 



where the second inequality follows from Lipschitz continuity of £ T , 1 < t < t. After simplification 
and adding the above expression for all t = 1, . . . ,T, 

or 

Srda(T) < — (1 + lnT). (29) 
a 

Note that the above stability analysis is slightly different from that of FTL as we are able to bound 
the stability by Lipschitz constant of £ t only, rather than £ t + r. 

Forward Regret: When (3 t = 0, forward regret follows easily from forward regret of FTL where 
loss function at each step is gjw + r(w). Hence, 

TK RDA {T)<0. (30) 

Hence, using Theorem 4, 

T 21? 
V (g 4 T (w t - w*) + r(w t ) - r(w*)) < (1 + lnT). 

The result now follows using convexity of £ t , i.e., £t(wj) — £t(w*) < g t • (w t — w*). □ 
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Next, we bound regret incurred by RDA for general convex, Lipschitz continuous functions. 

Theorem 9. Let each loss function i t be L-Lipschitz continuous and wlog min w6 c r(w) = and 
< h(w) < D 2 , Vw G C. Now, using fit = \ft at each step, regret of RDA (see (27)) is bounded 

by 2 -^VT. 

Proof. Stability: Again, by strong convexity of h and optimality of Wt+i and Wf for the t-th and 
t — 1-th step respectively, 

1 \ ■> fit fit 

- ^ gT • (w t - w t+ i) + r(wt) - r(w t+ i) + —(h(w t ) - h(w t+1 )) > ^l|w t - w t+ i|| 2 , 

r— 1 ~ 

1 \~ ^ /?t 1 fit 1 

7— [Xj gr • (w t+ i - w t ) +r(w t+ i) - r(w t ) + ^-(Mwt+i) - /i(w t )) > 2 (t~-l) ^ Wt ~~ Wt+1 l| 2 ' 
Adding the above two equations, using Lipschitz continuity of £ t and upper bound on h, 

1 W. - W. , 1 2 W. - W,, 1 ' n2 



Solving for ||w t — w t+ i||, we get, 

II II <r 2L + D ,™ 

||w t - w t+ i|| < — . (32) 

yi — 1 

Hence, 

5rda(T) < {2L + D)Vt. (33) 
Forward Regret: Using optimality of wt+i, 

T T 

^g^w* +Tr(w*) + VTh(w*) > • wt+i +Tr(w T +i) + \/T/i(wt+i). (34) 

t=l t=i 

Now, using optimality of Wf, 

T-l T-l 

X! g7 w T+i + (T - l)r(w T +i) + VT - 1/j(wt + i) > ^ g t • w T + (T - l)r(w T ) + VT - l/i(w T ). 

{=1 i=l 

(35) 

Adding the above two equations, 

T T-l 



X g^w* + Tr(w*) + Vfh(w*) > gfwT+i + r(w T +i) + ^ g^WT + (T - l)r(w T ) + VT - lh(w T ). 

t=i t=i 

(36) 

Similarly, combining optimality of w t , t — T, . . . , 1 in (35) recursively with (36), 

T T 

X g^w* + Tr(w*) + Vfh(w*) > X (g t T w t+1 + r(w 1+ i)) • (37) 

Hence, using min we c r(w) = and h(w*) < D 2 , 

^rda(T) < v^D 2 . (38) 
Hence, using Theorem 4 and convexity of each £ t , 

n RDA (T) < (D 2 + L(2L + D))Vt. (39) 

□ 
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7.4 Composite Objective Mirror Descent (COMiD) 

Similar to RDA, COMiD [8] is also designed to handle regularized loss functions of the form £ t + r. 
Just as RDA is an extension of FTRL to handle composite regularized loss functions, similarly, 
COMiD is an extension of IOL. Formally, 

COMiD : Wt+i = argmin^g^w + r(w)) + T>r(w, Wt), 
wee 

where g 4 = V£f(w t ), T>r(-, •) is the Bregman divergence with R being the generating function. 
Now, similar to RDA, regret analysis of COMiD follows directly from regret analysis of IOL. How- 
ever, [8] presents an improved analysis, that can handle non-Lipschitz continuous regularization r as 
well. Here, we show that using our stability /forward-regret based recipe, we can also obtain similar 
regret bounds with significantly simpler analysis. 

Theorem 10. Let each loss function be of the form i t + r, where £ t is a L-Lipschitz continuous 
function and r is a regularization function. Let diameter of set C be D, and let T>r{-, ■) be a Bregman 
divergence with R being the convex generating function. Let Wi = argmin wgC r(w). Also, let R be 
a positive function. Then, the regret incurred by the Composite Objective Mirror Descent (COMiD) 
algorithm is bounded by: 

n C0 MiD{T) < l^/2R{w*)Vt. 

Furthermore, if each function £ t is a-strongly convex w.r.t. T>r, then 

2L 2 

Tl COMiD {T) < (1 + InT) + aR(w*). 

a 



Proof. Stability: By optimality of w t+ 



r}t(gt ■ w* + r(w t )) > X> R (w t+ i,w t ) + f] t (gt ■ wt+i + r(w t+1 )), 
=>■ L||w t - w t+ i|| + r(w t ) > r(w t+ i) + — ||w t - w t+ i|| 2 . (40) 

Adding the above inequality for t = 1, ... ,T and using the fact that r(wi) < r(wr) (by the 
definition of Wi), 

T T 

^2w]—\\ w t - w *+iii 2 < ii w * ~ w *+ i ii- ( 41 ) 
t=i ^* *=i 

Using Cauchy-Schwarz inequality, 

T 



t 

Using (41) and (40), 



CC FTt— V^mllwt - w t+ i||) 2 < Y\-^~ ||w t - w t+ i|| 2 ^2L?7 t 



(42) 



5 CO MiD(T')=X||w t -w t+ i|| <2L^r/ t . (43) 

t=i t=i 

Forward Regret: Forward regret follows directly from the forward regret of IOL (53), i.e, 

T 

^coMiD = X (g7(w i+ i - w*) + r(w t+ i) - r(w*)) (44) 

t=i 

< — X> fl (w*,wi) + V f- - — -e^ X> H (w*,w t ). (45) 

Both the regret bounds follow using convexity of each £ t and setting step sizes rjt as in IOL (see 
(54), (55)). - - . q 
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7.5 Mirror Descent (MD) 

Mirror descent algorithms are a generalization of Zinkevich's Gradient Infinitesimal Gradient Ascent 
(GIGA) algorithms [25] where regularization can be drawn from any Bregman distance family. 
Formally, 

MD : w t+ i = axgmhxr]tg t w + V R (w, w t ), (46) 

wGC 

where T>r is the Bregman divergence generated using R. Note that MD update is the same as 
COMiD with r = 0. Hence, our stability analysis as well as 0(s/T) regret analysis for general 
convex functions follows directly. However, for strongly convex functions, our approach does not 
yield appropriate forward regret directly; primary reason being linearization of the function. Instead, 
we can obtain regret bound using standard approach (see [25]) and then obtain forward regret bound 
using Theorem 4. 

7.6 Implicit Online Learning (IOL) 

Implicit online learning [12] is similar to typical Mirror Descent algorithms but without linearizing 
the loss function. Specifically at iteration t + 1, 

IOL : w t+ i = argmin(2? fl (w, w t ) + r? t £ t (w)), (47) 

wGC 

where T)r(- : •) is a Bregman's divergence with R being the generating function. It was shown in 
[12] that using any strongly convex R, the above update leads to 0(-JT) regret for any Lipschitz 
continuous convex functions £ t . This paper also shows that if R is selected to be squared £2 -norm 
and each function it is strongly-convex and has Lipschitz continuous gradient, then O(lnT) regret 
can also be achieved. Below, using our recipe of forward regret and stability we reproduce signif- 
icantly simpler proofs for both O(Vt) as well as O(lnT) regret. Furthermore, our O(lnT) proof 
requires only strong-convexity and Lipschitz continuity, in contrast to strong-convexity and Lips- 
chitz continuity of the gradient in [12]. Also, our analysis can handle any strongly convex R, rather 
than just the squared ^-norm regularizer. 

Theorem 11. Let each loss function £ t be L-Lipschitz continuous, diameter of set C be D, and let 
T>ji be a Bregman divergence with R being the strongly convex generating function. Also, let R be 
a positive function. Then, the regret incurred by the Implicit Online Learning (IOL) algorithm (see 
(47)) is bounded by: 

n I0L (T) < 2L^/2R(w*)VT. 

Furthermore, if each function £ t is a-strongly convex w.r.t T)r i.e. T>t t (u, v) > aT>^(u, v), Vu, v £ 
C , then 

2L 2 

K 10L (T) < (l + lnT) + <xR(w*). 

a 

Proof. Here again, we follow the recipe of proving stability and forward regret. 
Stability: Stability again follows easily by using optimality of w t+ i and comparing it to w t . For- 
mally, 

I)A(W() > 2? fl (w t+ i,w t ) +r?A(w t+ i), 
■ndt{vft) > — 1| w t _,_i - Will 2 + 77^ t (w t+ i), 

2Lr) t > ||w t +i-w t ||, (48) 

where the first inequality follows by the strong convexity of R and the last one follows by using 
Lipschitz continuity and canceling ||wt+i — w t \\ from both sides. Hence, 

T 

S 10L (T)<2LY / Vt- (49) 

t=i 

Forward Regret: Similarly, forward regret follows by using optimality of w f+ i and comparing it 
to w* . Formally, 

(w* - w t+1 ) T ( I)i Vf i (w i+1 ) + Vi?(w f+1 ) - V#(w t )) > 0, 
(w* - wt+i) T (Vi?(w t+1 ) - V-R(wt)) > 

f/tWt(w t+ i) T (w m - w*), 
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T> R (w*, w t ) - T> R (w*, w t+ i) - V R (w t+ i,w t ) > 

7 ?t W t (w m ) T (w t+1 -w*). (50) 

where (50) follows from the previous step using the three point inequality [17]. Now, if £ t is a- 
strongly convex w.r.t T> R (-, •), then, 

^t(w t+ i) T (w t+ i - w*) > £ t (w t+ i) - £f(w*) + aDj{(w*,wt + i). (51) 

Note that strong convexity w.r.t. T> R is a stronger condition than the usual strong convexity w.r.t 1 2 
norm. Also, for the first part of the theorem, we can assume a = 0. 

Using (50) and (51), and adding over all T steps, 

rp rp 

^iol(T) =^4(wt+i)-4(w*) < -J-2? fl (w*,w 1 )+^ fl _ _L _ D K ( w *,w t ). 

(52) 



t=i ^ 



Hence, using Theorem 4 with (49) and (52), 

< 2L 2 V77 f + -D fl (w*,wi) + V (— - — -a) Z> fl (w*,w t ) (53) 

Now, let us first consider the case when a — 0, i.e., when functions it need not be strongly convex. 
In this case, selecting each ijt = r\ and wi = argmin weC R(w), we can use the optimality of wi to 
claim Vi?(wi) T (w* — Wi) > 0. Coupling this with the non-negativity of R, we get T> R (w* , Wi) < 
R(w*). This gives: 

^iol(T) < 2r 1 L 2 T+-V R (w*,w 1 ) < 2L^2R(w*)T (54) 
V 

by optimizing over the choice of rj. Next, for the case when a > 0, selecting i]t = and wi = 
argmin w6C i?(w), 

2L 2 

%ol(T) < (1 + lnT) + ai?(w*). (55) 

a 

Hence proved. □ 

8 Analysis of approximate online algorithms 

We analyze approximate versions of online algorithms where the updates at every step are not the 
exact minimizer of the corresponding objective but approximate ones. In particular, the updates 
minimize the objective upto an additive error St at iteration t as would be commonly obtained by 
some iterative optimization method. We show that even with such approximate updates we can 
obtain sublinear regret over T steps for Regularized Dual Averaging (RDA) [24], FTRL as well as 
IOL. 

Although RDA requires solving an optimization problem at every step, it is successful in maintaining 
the sparsity of the intermediate iterates and thus finds use in a host of applications where sparsity is 
essential [24]. However, it is typically impossible to solve an optimization problem exactly at every 
step. Hence, it is interesting to analyze the behaviour of RDA under such approximate updates. 
8.1 Approximate RDA 

The exact updates of the original RDA algorithm are given by 

t 

RDA: W; +1 = argmin g^ w + t 1 K w ) + Pth(~w) . 

where g T = V£ T (w T ), the gradient of the loss function at iteration r, r is a regularization function 
which is part of the objective while h is a strongly convex regularizer added by the algorithm. Using 
w t+ i to denote the approximate update in this case we have 

t t 

^2sJ w t+ i +t ■ r(wf+i) + p t h(-w t +i) < X^Sr w t+i + *• r K+i) + AM w t+i) + St (56) 

T=l T=l 

The following theorem bounds the regret for approximate RDA. 
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Theorem 12. Let each loss function l t be L-Lipschitz continuous and wlog min wg c r(w) = and 
< h(w) < D 2 , Vw e C. Now, using ft — \rt and 8 t — 0(1/ y/i) at each step, the regret of 
approximate RDA is bounded by 0(VT). 

Proof. Stability: Using (32), we know that 

II * * ii <r 2L + D 
||w t - w t+1 || < —f=j- 

Using the triangle inequality we can bound the gap between the successive iterates of the approxi- 
mate algorithm as 

||wt - w f+ i|| < ||wt - w 4 *|| + ||w t * - w t * +1 || + ||w t+ i - w* +1 || . 

Using /t(w) = g^w + t ■ r(w) + (3 t h(w) we note that the function f t is fto/j strongly convex 
where 07, is the coefficient of strong convexity of h. Using the optimality of w^ +1 we know that 

/t(w t + 1) > /t(w* +1 ) + -y^ ||w t+ i - w* +1 || 2 
Coupling this with (56), we have 

r 2&7 



w t+ i - w t+1 < 



Similarly we have ||w t — | < \/ n t t 1 „ h ■ Combining these we get 



. 2L + D 25 t 2S t -i 

\w t - w t+ i < 



\/t-i y fta- ft y fit-ivh 

This gives a bound on the stability 



S(T) = £||w t -w t+1 ||<^-=^ T +^2 ' 



Forward Regret: We have 

T T 

J^gjw* + T ■ r(w*) + /3 T h(w*) > ^g^w^ +1 + T • r(w^ +1 ) + /3 T /i(w^ +1 ) 

r=l r=l 

T 

> ^ g^WT+i + T ■ r(w T +i) + /3t^(wt+i) — <St 

r=l 

Writing up this inequality for all values of t we have 

T T 

gjw* + T • r(w*) + fahtyr*) > ^ (gjw t+ i + r(w t+1 )) + ^ (ft - ft_i) h(w t+1 ) -J2 St 

T=l T=l t t 

Appropriate simplification and using the fact that ft > 0, Vi and < /i(w) < I? 2 , Vw we have 

•F£(T) < ^ T /i(w*) + ^ S t < VTD 2 

t t 

using the fact that ft = yi . Thus the regret bound is given by 



Using St = 0(1 /y/i) we have that the second term on the RHS is bounded by 0(T x l 2 ) while 
all the other terms are bounded by 0(T x l 2 ) which gives the following sublinear regret bound of 
Rt < 0(T X / 2 ). □ 
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8.2 Approximate FTRL 

Recall the original FTRL algorithm 



w* +1 = argmin^^ rj T l T (w) + R(w) 



where R is the (possibly strongly convex) regularizer. Our algorithm chooses w t+ i such that 
t t 

Vrlriwt+l) + #(w t+ l) < Y ? /r^r(w* +1 ) + i?(w* +1 ) + S t+1 

T=l T=l 

For notational convenience we use the following notation. 



(57) 



S t {w)=Y,Vrlr(w)+R(w) 



Since R is strongly convex in w, S t is also strongly convex and satisfies 

1 l|2 

<S 4 (w t+1 ) > S t (-w* +1 ) + (S t {w* t+1 ),w t+1 - w 4 * +1 ) + - ||w t+ i - w* +1 || 

But S t (w t+1 ) < S t (w* +1 ) + 5 t . Thus 

a ^ 1 II * II 2 

o t > 2 || w *+i - w t+i|| ==>■ 



w t+ i - w* +1 < V2<5 t 



Stability: Using the standard stability bound of FTRL and assuming r\ t = r\ for all t, we have 



|w t - w t+ i|| < || w f - w*| 



w f — w 



t+1 1 



f t+l - w t+ll 



Thus 



< yj25 t + Lr]+ y/26t+i 

T T 

5^||wt-w 1+ i|| < LTr] + J2[V^S~t + V^t+i 



(=i 



< 



t=i 

T 



(58) 



where the last step follows by assuming that 5t is a strictly decreasing sequence in t. 

Forward Regret: We have 

t t 



r=l 



Using (57) and telescoping we get 

T 



Vl t (w t+1 )-I t (w*) < i(i?(w*)-i?( Wl ))+ V 



St 

I] 



Using the convexity of R and Cauchy Schwartz inequality we have 

R(w*) --R(wi) < (Vi?(w*),wi - w*) < ||Vi?|| ||w* -wi|| < GD 
Thus ril(T) < — + J2t~- Usin g the stability theory we have 



K(T) < LS{T) + F1Z(T) 



T r 



Choosing 77 = and S t — S — ^ > we g et 

R T < gdVt + l 2 Vt + Y 
< gdVt + l 2 Vt 



s 



2V25 



t=i 
T8 3 ' 4 



V 



v l/2 

Using S 3 ' 4 = 0(T- 3 / 4 ) we get that R T = OiT 1 / 2 ). Note that the last fine uses the AM-GM 
inequality which is only attained at equality that justifies the values of rj and S. 
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8.3 Approximate IOL 

The updates of the original IOL algorithm are given by 



w t+1 = argmm 



1 2 
»7tM w ) + - ||w - w t || 



(59) 



We use /t(w) to denote r/t^t(w) + i || w — w t || 2 in the sequel. Similar to the FTRL case, we assume 
that Wt+i is a <5 t approximate solution. Thus 

1 2 1 2 

%^(w t+ i) + - ||w i+ i - w t || < ?yA(wj +1 ) + - ||w* +1 - Wi|| + <5 t (60) 
Since W( +1 is optimal we have 

(V/ t (w* +1 ),w t+1 -w* +1 )>0 
Using the optimality of Wj +1 and the strong convexity of f t we have 

1 2 1 ,|2 1 2 

^t( w t+i) + 2 ll w t+i ~ Wt ll + o H W *+l _Wt + 1 ll - vAi^t+i) + 2 ll w «+i ~ w *ll 

1 2 

< ??^t(w t * +1 ) + - W* +1 - W t + S t 



Simplifying we get 



|w t+ i - w* +1 || < y/25 t 



(61) 



Forward Regret: Denoting w* as the minimizer after T steps we have using optimality of w£ +1 
and the strong convexity of f t , 

1 2 1 ,|2 1 2 

vM^'t+i) + 2 ll w t+i ~ w *ll +2 H Wt * +1 ~~ W *H - 7?t ^( w *) + 2 " W * ~~ Wt " 

Now 



"t+i| 



= || w - w t+ i + w t+ i - w t+1 || 

2 1 1 1 1 2 ill l 

>||w*-w t+ i|| + ||w t+ i - w* +1 || - 2 ||w* - w t+ i|| 2 ||w t+ i - Wj +1 | 



Using the fact that ||w t+ i — w*|L < D, the diameter of the set, we get 

|| w * _ w t+i|| 2 ^ ll w * ~ w *+i|| 2 - 2D^f5~t 
Combining (61) and (62) we get 

1 " . i,2 , 1 „ . „ 2 ^ . , 1 „ . 



(62) 



-||w f -w t+ i|| < ri t £ t (-w*) + - ||w* - w t || +Dy / 2S t 



r] t £ t (w'* +1 ) + - ||w* +1 - w t 
Using (60) we have 

vA(wt+i) + 2" ll w *+i - w t|| 2 + ^ H w * - w *+il| 2 < vA(w*) + ^ ||w* - w t || 2 + 5 t + D^/25 t 



|w t+ i - w t || + S t + DW25 t 



This can be rewritten as 

r)tlt{vrt+i) < i}t£t(vr*) + 7; ll w * _ w «l| 2 - ll w * - w t+i|| 2 

Adding up the above inequality for t = 1 . . . T and assuming rjt = f] we note that some of the terms 
on the RHS cancel out by telescoping. Using the fact that ||w* — wi | < D this gives 

$>(w w ) < + f + + ^ 

27/ 77 77 



Thus we have forward regret 



TTZ(T) < 



D 2 , EjA , 

2r) ?] rj 



(63) 
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Stability: Using the strong convexity of the objective we have 

1 2 1 2 

??A(w*+i) + 2 ll w t+i - w «ll +2 H Wt+1 * _W *H" - ^*( w t) 
Using the fact that £ t is L— lipschitz continuous we have 

||w* +1 - w t || < Lrj t 
Using (61) and the triangle inequality we get 

||wt+i - wj < Lr) t + V2^ (64) 
Combining stability and forward regret we get 

K(T) < LS(T) + TK(T) = + + + L 2 t]T + L V ^2S t 

2rj rj rj V 



Using the fact that S t < \/5t we have 
R T < 



2rj rj 




< 2 ^2L 2 T(D + ^2 v%+ Dy/Wt)^ +L^y/25 t (65) 



Setting S t = 1/twe have ^ t *JS~ t = 0{T X / 2 ). Replacing it in (65), we have 

R T < 0(2LT 3/4 ) + 0(LT 1/2 ) = 0(LVDT 3/4 ) 
thus giving sublinear regret for the IOL algorithm. 

On the other hand, setting St = 1/i 2 gives ^ t \f3~t = 0(log T). Replacing this in (65), we get 

R T < 0(2LT 1/2 ) + 6{L) = 6(L^fDT 112 ) 
where O hides logarithmic factors in T. 

While we provide rates on St for getting regret bounds akin to the exact optimization model for 
the various optimization algorithms we should forewarn the readers that each of these algorithms 
optimize potentially different objectives and therefore comparing the values of St directly would 
be misrepresentative. The main purpose of the approximate analysis is to illustrate that there exist 
precision accuracies so that if an optimization oracle optimizes the objectives at every iteration to 
such precision, the resulting regret bounds are of the same order as the theoretical exact computation 
setting. 

9 Conclusion 

Recent research [20, 16] has sought to establish connections between stability and online learn- 
ability. In the light of our work, it becomes evident that online stability is a crucial concept in 
online learning. It is not only related to the ability to minimize regret but also provides us with a 
straightforward recipe to analyze regret for most existing online learning algorithms via a remark- 
ably simplified analysis. 

It will be interesting to see to what extent this result extends to arbitrary non-convex sets. Finally, 
stability based proofs for regret bounds of algorithms such as FTRL, IOL and RDA easily extend to 
the case where the optimization problem arising at every step of these algorithms is only solved ap- 
proximately. This opens up many avenues for further exploration. Can we compare algorithms based 
on the trade-offs they offer between low regret and small amount of computation per step? Like reg- 
ularization and random perturbations, can approximate computation itself serve as the source of 
stability in online learning algorithms? 

In contrast to the iid setting, there is unfortunately still a significant gap in our understanding the 
role of stability for online learning. The biggest shortcoming of existing work is that most of the 
stability based analysis (including ours) in online learning is still based on analyzing stability of 
algorithms. A connection of stability with the online learnability of the underlying concept class 
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is still missing. In contrast, [22] provides a generic equivalence between the existence of a stable 
AERM and the learnability of a concept class in the generic batch setting. We think that a major 
reason behind this is the absence of a canonical scheme like Empirical Risk Minimization which 
can characterize online learnability for all concept classes. While our definition of online stability 
provides a new way of looking at online regret, it is still an open problem to understand stability and 
online learnability [19] fundamentally in a manner akin to the batch learning framework. 
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